Using Metatab Resources In Pandas

There are two ways to use Metatab data package resources in Pandas. One is to use the CSV files directly, which is easy to do if the package is published to a repository. However, it is better to use the Metatab module to load the package metadata and create dataframes.

Using CSV Files Directly

The simplest was to use the file in a metatab package is to load it's CSV file directly. You can get the CSV file URL from the data repostory page, such as this page for the ADOD Prevalence data in the San Diego Elder Dementia dataset.

While this is simple and portable, it does not give you the features of Metatab, such as built in schema documentation.



In [3]:

    
import pandas as pd

df = pd.read_csv('http://s3.amazonaws.com/library.metatab.org/sandiegocounty.gov-adod-2012-sra-3/data/adod-prevalence.csv')

df.head()









    Out[3]:






  
    
      
      region
      adod_prevelance_2012
      adod_prevelance_2020
      adod_prevelance_2030
    
  
  
    
      0
      Central San Diego
      3193.0
      4958
      6424
    
    
      1
      Mid-City
      3136.0
      3698
      4227
    
    
      2
      Southeast San Diego
      3203.0
      4160
      4985
    
    
      3
      East
      14865.0
      18410
      21489
    
    
      4
      Alpine
      339.0
      426
      555

Using the Metatab Package

The second way to access a package is to use the metatab package. This method requires installing the metatab python package, but has some important advantages: it gives you direct access to package and dataset documentation. You can load any type of metatab package with the open_package() function, but for the highest performance, you should use the CSV package. Opening CSV package loads only the metadata and the resources you need, while using a ZIP or Excel packackage requires downloading the entire package first.

To find the CSV package in a package that is publiched to a CKAN repository, look for a CSV file with the description of "CSV Package Metadata in Metatab format". For the ADOD package, this file is named sandiegocounty.gov-adod-2012-sra-3.csv.

Opening the package returns a Metatab document object. If you display it in Jupyter, the output cell will display the package documentation.



In [7]:

    
import metatab
doc = metatab.open_package('http://s3.amazonaws.com/library.metatab.org/sandiegocounty.gov-adod-2012-sra-3.csv')
doc









    Out[7]:





San Diego Elder Dementia
sandiegocounty.gov-adod-2012-sra-3
Current (2012) ADOD and General population data along with projections for 2020 and 2030 for San Diego county
Documentation

SD County HHSA Reports None
SD County HHSA ADOD Packet Upates Updates to the SD HHSA ADOD profiles in the county. All data is extracted from this document
Contacts

Origin:  County of San Diego Health and Human Services Agency
Creator: Lesie Ray 
Wrangler: Rashmi Keshava Iyengar San Diego Regional Data Library
Wrangler: Eric Busboom Civic Knowledge
Resources

adod-prevalence - http://s3.amazonaws.com/library.metatab.org/sandiegocounty.gov-adod-2012-sra-3/data/adod-prevalence.csv Table 1. Estimates of Prevalence of Alzheimer's Disease and Other Dementias by Subregional Area, 55 Years and Over, San Diego County, 2012 - 2030
hospital-discharge - http://s3.amazonaws.com/library.metatab.org/sandiegocounty.gov-adod-2012-sra-3/data/hospital-discharge.csv Table 2. Number of Emergency Department or Hospital Discharged Patients with Any Mention of Alzheimer's Disease and Other Dementias by Subregional Area, 55 Years and Over, San Diego County, 2012 - 2030
elder-population-2012 - http://s3.amazonaws.com/library.metatab.org/sandiegocounty.gov-adod-2012-sra-3/data/elder-population-2012.csv Table 3. 2012 Population by Age Group and Subregional Area, 55 Years and Over, San Diego County
elder-population-2020 - http://s3.amazonaws.com/library.metatab.org/sandiegocounty.gov-adod-2012-sra-3/data/elder-population-2020.csv Table 4. 2020 Population Projections by Age Group and Subregional Area, 55 Years and Over, San Diego County
elder-population-2030 - http://s3.amazonaws.com/library.metatab.org/sandiegocounty.gov-adod-2012-sra-3/data/elder-population-2030.csv Table 5. 2030 Population Projections by Age Group and Subregional Area, 55 Years and Over, San Diego County

The .resource() method will return one of the resoruces. Displaying it shows the resoruce documentation.



In [4]:

    
r = doc.resource('adod-prevalence')
r









    Out[4]:




adod-prevalence
http://s3.amazonaws.com/library.metatab.org/sandiegocounty.gov-adod-2012-sra-3/data/adod-prevalence.csv

Header Type Description
region text  
adod_prevelance_2012 integer  
adod_prevelance_2020 integer  
adod_prevelance_2030 integer

Once you have a resource, use the .dataframe() method to get a Pandas dataframe.



In [6]:

    
df = r.dataframe()
df.head()









    Out[6]:






  
    
      
      region
      adod_prevelance_2012
      adod_prevelance_2020
      adod_prevelance_2030
    
  
  
    
      0
      Central San Diego
      3193.0
      4958
      6424
    
    
      1
      Mid-City
      3136.0
      3698
      4227
    
    
      2
      Southeast San Diego
      3203.0
      4160
      4985
    
    
      3
      East
      14865.0
      18410
      21489
    
    
      4
      Alpine
      339.0
      426
      555



In [ ]:

	region	adod_prevelance_2012	adod_prevelance_2020	adod_prevelance_2030
0	Central San Diego	3193.0	4958	6424
1	Mid-City	3136.0	3698	4227
2	Southeast San Diego	3203.0	4160	4985
3	East	14865.0	18410	21489
4	Alpine	339.0	426	555

Header	Type	Description
region	text
adod_prevelance_2012	integer
adod_prevelance_2020	integer
adod_prevelance_2030	integer